*Processes Input-Ouput tabkes

*****************************************************************************

* Import correspondence tables
import delimited "${input_r}sources\Input_Output\passage_a137.csv", delimiter(";") varnames(1) clear 

* Modify the code_a138 variable to remove the first character
replace code_a138 = substr(code_a138, 2, length(code_a138))

keep if code_a138 != ""

keep code_a138 naf_rev2

save "${input_r}\sources\Input_Output\tab_passage", replace

***********************************

* Import the Input-Ouput table 2017
import delimited "${input_r}\sources\Input_Output\io_2017.csv", delimiter(";") varnames(1) clear 

rename * var*
rename varbranches row_from
reshape long var, i(row_from) j(col_to) string

rename (row_from col_to var) (from to value)
replace to = upper(to)

* By sector of arrival, calculate total consumption values to create shares
bys to: egen total = total(value)
gen share = value / total

* Validate a specific example of share and value calculations for a given sector
* Check sector F42Z (Genie civil) from sector C25A (Fabrication d'elements en metal)
br if to == "F42Z" & from == "C25A"
* Expected share: 7.18%, value: 1934, total: around 26000

* Generate a count variable to check for duplicates within 'to' and 'from' sectors
bys to from: gen nb = _N
su nb, d 
* Drop the count variable as no duplicates exist
drop nb

* Join with tab_passage data to map the code_a138 to NAF rev2 codes
rename to code_a138
joinby code_a138 using "${input_r}\sources\Input_Output\tab_passage"
rename code_a138 to 

* Check for duplicates after joining the data
bys to from: gen nb = _N
su nb, d 
br if nb >= 2

* Adjust the value variable to maintain the same total by dividing by the number of duplicates
replace value = value / nb
drop nb

sort to from naf

*****************
* Aggregating for cases where multiple A138 codes correspond to the same NAF code
bys from naf: gen nb = _N
su nb, d
br if nb >= 2

* Collapse data by summing values, grouping by 'from' and 'naf'
count
collapse (sum) value, by(from naf)
count

rename naf to
sort to from

*****************
* Map the 'from' sector back to NAF2 using tab_passage
rename from code_a138
joinby code_a138 using "${input_r}\sources\Input_Output\tab_passage"
rename code_a138 from 

* Check for duplicates once more
bys to from: gen nb = _N
su nb, d 
br if nb >= 2

* Adjust the value variable again to maintain the same total
replace value = value / nb
drop nb

* Sort data by 'to', 'from', and 'naf'
sort to from naf

*****************
* Aggregate data where multiple A138 codes correspond to the same NAF2 code
bys to naf: gen nb = _N
su nb, d
br if nb >= 2

count
collapse (sum) value, by(to naf)
count

* Calculate shares again at the NAF2 level by summing consumption values
bys to: egen total = total(value)
gen share = value / total

* Rename the naf variable back to 'from'
rename naf from 

* Validate a specific example to ensure accuracy
br if to == 42 & from == 251

* Keep only relevant columns for the final dataset
keep from to share

* Display summary statistics of the 'share' variable
su share, d

* Save the final dataset in Stata format
save "${input_r}\sources\Input_Output\io_2017.dta", replace

************************************
